Explain the concepts and uses of a relational database management system
Identify the different types of key in a RDBMS
Understand the principles of normalisation on a relational database
Introduction to Databases
A Database Management System (DBMS) provides...
...an efficient, reliable, convenient and safe multi-user storage and access to massive amounts of persistent data.
A Database is ...
→ Massive
→ Persistent
→ Safe
→ Multi-user
→ Convenient
→ Efficient
→ Reliable
Key Concepts
→ Data Model
→ Schema vs Data
→ Data Definition Language (DDL)
→ Data Manipulation Language (DML)
Key People
→ DBMS Implementer
→ Database Designer
→ Database Application Developer
→ Database Administrator
Relational Databases
Relational Database Management System (RDBMS):
A database where data is organised into tables with defined relationships between them
Efficiency
Redundancy
Update Issues
Deletion
Standardisation
blank
Increased speed and storage
Efficiency
Redundancy
Update Issues
Deletion
Standardisation
blank
Less redundant and duplicated data
Efficiency
Redundancy
Update Issues
Deletion
Standardisation
blank
Fewer problems updating data
Efficiency
Redundancy
Update Issues
Deletion
Standardisation
blank
Less chance of deleting important data
Efficiency
Redundancy
Update Issues
Deletion
Standardisation
blank
Design follows consistent principles
The Relational Model
Employee
emp_id
emp_name
dpt_id
111
Alex
10
112
Liz
20
113
Joshua
10
114
Zoe
NULL
Department
dpt_id
dpt_name
10
Sales
20
HR
30
Operations
The Relational Model
Employee
emp_id
emp_name
dpt_id
111
Alex
10
112
Liz
20
113
Joshua
10
114
Zoe
NULL
Department
dpt_id
dpt_name
10
Sales
20
HR
30
Operations
The Relational Model
Employee
emp_id
emp_name
dpt_id
111
Alex
10
112
Liz
20
113
Joshua
10
114
Zoe
NULL
Department
dpt_id
dpt_name
10
Sales
20
HR
30
Operations
The Relational Model
Schema: A structural description of relations in a database
Instance: Data stored in database at a given point in time
NULL: the absence of a value
Employee
emp_id
emp_name
dpt_id
111
Alex
10
112
Liz
20
113
Joshua
10
114
Zoe
NULL
Department
dpt_id
dpt_name
10
Sales
20
HR
30
Operations
shippers_table
shipperID
companyName
phone
234
BigShippers
01302858888
235
DeliverForce
01302823444
236
ShipUK
01709282327
237
WeShip
01302339188
The Alternative...
Difficult to insert new data
Cannot modify existing data
Cannot delete information
Coach ID
Coach Name
No_Cohorts
Assistant
Apprentice
Cohort
Age
Line Manager
Program
Employer
1
Steph
5
Ashray
Adam
Standard
20
Eva
Data
Santander
1
Steph
5
Ashray
Natasha
Outliers
19
Boris
Data
Google
2
Ben
2
Ashray
Kingsley
Patch
23
Mila
Data
Visa
3
Tony
3
John
Grace
Sprite
21
Mila
Marketing
Visa
1
Steph
5
Ashray
Greta
Outliers
22
Isabel
Data
Facebook
4
Bruce
5
John
Alison
Movers
20
Henry
Marketing
Facebook
Keys
Primary Key
A unique identifier for each row in a table
coach_table
Coach Id
Coach Name
No_Cohorts
Assistant
1
Steph
5
Ashray
2
Ben
2
Ashray
Primary keys...
...cannot be NULL
...must be unique
...should rarely be changed
...given a new value when a new record is created
Foreign Key
A field in one table that is a primary key in another table . These keys enable relationships between tables and allow them to be joined.
apprentice_table
app_id
name
coach_id
16
Adam
1
23
Natasha
1
20
Kingsley
2
coach_table
coach_id
name
assistant
1
Steph
Ashray
2
Ben
Ashray
Composite Key
Two or more columns together acting as a primary key
assignment_table
apprentice_id
module_id
grade
coach_id
1
1
99
1
1
2
75
1
2
1
80
2
2
2
78
2
Activity
In groups discuss how this table can be redesigned into something more useable.
How many tables would you create?
What are the primary/foreign keys?
What information would be placed in each?
coach_table
program_table
cohort_table
apprentice_table
coach_id
coach_name
no_cohorts
apprenticeship
apprenticeship
TA
cohort_name
coach_id
app_name
age
department
apprenticeship
cohort
Normalisation
Normalisation is the process of structuring a database to reduce data redundancy and improve data integrity.
Why Normalise a Database?
Databases are more efficient
Data is prevented from being stored in multiple locations (insert anomaly)
Updates are prevented from being made to some data but not others (update anomaly)
Data is prevented from being lost when it is not supposed to be, or not deleted when it should (deleted anomaly)
Why Normalise a Database?
Data is more accurate
Storage space is reduced
ID
Brand
Company
Supermarket
Country
Price
Rating
Rating
101
Aero
Nestle
Coop/Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
One Sixty
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
First Normal Form
Second Normal Form
Third Normal Form
→ Each cell only contains one data point
→ Each column contains only one data subject
→ Columns should each have a unique name
→ Identification should not rely on the way the data is sorted
→ Compliant with 1NF
→ Each table contains relevant data
→ There are no partial dependencies
→ Compliant with 2NF
→ There are no transitive dependencies
First Normal Form
Each cell only contains one data point
ID
Brand
Company
Supermarket
Country
Price
Rating
Rating
101
Aero
Nestle
Coop/Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
One Sixty
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
First Normal Form
Each cell only contains one data point
ID
Brand
Company
Supermarket
Country
Price
Rating
Rating
101
Aero
Nestle
Coop
UK
1.70
10
Excellent
101
Aero
Nestle
Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
One Sixty
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
Student
Number
Allie
07551502613
Bernard
07723871451, 07464998651
Charlotte
07818771127
Student
Number 1
Number 2
Allie
07551502613
Bernard
07723871451
07464998651
Charlotte
07818771127
Student
Number
Allie
07551502613
Bernard
07723871451
Bernard
07464998651
Charlotte
07818771127
First Normal Form
Each column contains only one data type
ID
Brand
Company
Supermarket
Country
Price
Rating
Rating
101
Aero
Nestle
Coop
UK
1.70
10
Excellent
101
Aero
Nestle
Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
One Sixty
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
Student
Misc
Allie
Spaghetti
Bernard
Toyota
Charlotte
Blue
Student
Fav Food
Allie
Spaghetti
Bernard
Charlotte
First Normal Form
Each column contains only one data type
ID
Brand
Company
Supermarket
Country
Price
Rating
Rating
101
Aero
Nestle
Coop
UK
1.70
10
Excellent
101
Aero
Nestle
Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
1.60
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
First Normal Form
Columns should each have a unique name
ID
Brand
Company
Supermarket
Country
Price
Rating
Rating
101
Aero
Nestle
Coop
UK
1.70
10
Excellent
101
Aero
Nestle
Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
1.6
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
First Normal Form
Columns should each have a unique name
ID
Brand
Company
Supermarket
Country
Price
Rating_num
Rating_desc
101
Aero
Nestle
Coop
UK
1.70
10
Excellent
101
Aero
Nestle
Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
1.6
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
First Normal Form
Identification should not rely on the way data is sorted
ID
Brand
Company
Supermarket
Country
Price
Rating_num
Rating_desc
101
Aero
Nestle
Coop
UK
1.70
10
Excellent
101
Aero
Nestle
Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
1.6
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
Student
Favourite Soup
Allie
Tomato
Mushroom
Charlotte
Pea
Student
Favourite Soup
Allie
Tomato
Allie
Mushroom
Charlotte
Pea
First Normal Form
Second Normal Form
Third Normal Form
→ Each cell only contains one data point
→ Each column contains only one data subject
→ Columns should each have a unique name
→ Identification should not rely on the way the data is sorted
→ Compliant with 1NF
→ Each table contains relevant data
→ There are no partial dependencies
→ Compliant with 2NF
→ There are no transitive dependencies
Second Normal Form
Each table contains relevant data
ID
Brand
Company
Supermarket
Country
Price
Rating_num
Rating_desc
101
Aero
Nestle
Coop
UK
1.70
10
Excellent
101
Aero
Nestle
Tesco
UK
1.70
10
Excellent
101
Aero
Nestle
Sainsburys
UK
1.6
10
Excellent
102
Bounty
Mars
Walmart
USA
1.30
2
Bad
102
Bounty
Mars
Tesco
UK
1.20
2
Bad
102
Bounty
Mars
Sainsburys
UK
1.10
2
Bad
Second Normal Form
Each table contains relevant data
Compliant with 1NF
price_table
Supermarket
Price
Country
Coop
1.70
UK
Tesco
1.60
UK
Sainsburys
1.60
UK
Walmart
1.30
USA
Tesco
1.20
UK
Sainsburys
1.10
UK
chocolate_table
ID
Brand
Company
Rating_num
Rating_Desc
101
Aero
Nestle
10
Excellent
102
Bounty
Mars
2
Bad
Second Normal Form
Compliant with 1NF
price_table
Supermarket
Price
Country
Coop
1.70
UK
Tesco
1.60
UK
Sainsburys
1.60
UK
Walmart
1.30
USA
Tesco
1.20
UK
Sainsburys
1.10
UK
Compliant with 1NF
price_table
Chocolate_ID
Supermarket
Price
Country
101
Coop
1.70
UK
101
Tesco
1.60
UK
101
Sainsburys
1.60
UK
102
Walmart
1.30
USA
102
Tesco
1.20
UK
102
Sainsburys
1.10
UK
Second Normal Form
There are no partial dependencies
price_table
Chocolate_ID
Supermarket
Price
Country
101
Coop
1.70
UK
101
Tesco
1.60
UK
101
Sainsburys
1.60
UK
102
Walmart
1.30
USA
102
Tesco
1.20
UK
102
Sainsburys
1.10
UK
Partial dependencies occur when a table with a composite key has a field which is only dependent on part of it
Second Normal Form
There are no partial dependencies
There are no partial dependencies
price_table
Chocolate_ID
Supermarket
Price
Country
101
Coop
1.70
UK
101
Tesco
1.60
UK
101
Sainsburys
1.60
UK
102
Walmart
1.30
USA
102
Tesco
1.20
UK
102
Sainsburys
1.10
UK
price_table
Chocolate_ID
Supermarket
Price
101
Coop
1.70
101
Tesco
1.60
101
Sainsburys
1.60
102
Walmart
1.30
102
Tesco
1.20
102
Sainsburys
1.10
Second Normal Form
There are no partial dependencies
supermarket_table
Supermarket
Country
Tesco
UK
Walmart
USA
First Normal Form
Second Normal Form
Third Normal Form
→ Each cell only contains one data point
→ Each column contains only one data subject
→ Columns should each have a unique name
→ Identification should not rely on the way the data is sorted
→ Compliant with 1NF
→ Each table contains relevant data
→ There are no partial dependencies
→ Compliant with 2NF
→ There are no transitive dependencies
Third Normal Form
There are no transitive dependencies
chocolate_table
ID
Brand
Company
Rating_num
Rating_Desc
101
Aero
Nestle
10
Excellent
102
Bounty
Mars
2
Bad
Transitive dependencies occur when a field can be inferred from another field that not the primary key
Third Normal Form
There are no transitive dependencies
chocolate_table
ID
Brand
Company
Rating_Num
101
Aero
Nestle
10
102
Bounty
Mars
2
rating_table
Rating_Num
Rating_Desc
10
Excellent
2
Bad
Third Normal Form
Compliant with 2NF
Partial vs Transitive Dependency
Partial dependency occurs when a table has a composite key and a field is dependent on one part of it
City partially dependent on composite key
item_id
vendor
price
city
01
Tesco
1.70
London
Transitive Dependency occurs when a field can be inferred from another field that is not the primary key
Category can be inferred from population
county
population
category
Essex
1432000
large
Understanding how databases are designed, and why they are designed this way can make SQL - especially joins - easier to understand
You most likely will never need to design a full on database - that’s a very specific career progression. However, understanding the principles that went into the design of the databases you work with will help your understanding of them.
Activity
Convert this table step by step so it is in Third Normal Form
Also discuss:
What are the main benefits of Data Normalisation?
How does querying change because of First Normal Form?
transaction_table
product_table
store_table
county_table
id
date
item_no
store_id
item_no
item_description
case_cost
proof
store_id
store
store_address
county
county
population
NoSQL
NoSQL ("Not only SQL") is an alternative to traditional relational databases that can accomodate a wide variety of data models.
Document Orientated Database
A common NoSQL database where all instances of an object is stored in one document as opposed to spread across multiple tables. To access the data you reference the internal structure
Documents are organsied into collections (similar to RDBMS tables)
E.g. XML, JSON
JSON
{
contact{
"firstname":"Bob",
"lastname":"Smith",
"address":"5 Oak St",
"number":"07464998651"
}
}
XML
<contact> <firstname>Bob</firstname> <lastname>Smith</lastname> <address>5 Oak St</address> <number>07464998651</number> </contact>
Key Value Database
The simplest type of NoSQL database, where data (structured or unstructured) is mapped to a key and stored in one location. Data is extracted by referencing the key
Wrapping several keys in a JSON format creates a document
Columnar Database
Data is stored in columns instead of rows. SQL can be used to extract data quickly as it will go down the columns for information instead of scanning each row.
A column orientated database applies the row key to each item in a column, allowing it to precisely retrieve information from a select group of columns.
Graph Database
Information is stored in nodes with the edges representing the relationships between them.
Each node is a datapoint (e.g. a customer, product, group, etc) and edges define the relationships (e.g. person (node1) is a member of this group (node2).
Querying is fast as relationships between nodes have already been defined.
Graph Database
A node can contain any type of data (structured or unstructured) including tables
As these types of databases do not use indexing, data retrieval is fast as the query follows the edges to obtain the connected information
Facebook is an example of this where a user (node) is connected to other users and groups by edges
Key Features
No Fixed Schema - They do not have to follow historical rules making them more flexible
No Joins - NoSQL databases tend to store data in one large table, taking advantage of cheaper storage and faster processing where redundant data is no longer such a big issue
Size and Scale - No defined limits allows them to scale according to the resources available
Other Types of Database
Type of DB
Examples
Description
Relational
MSSQL, Postgres, MySQL
A very common way of managing data. It contains tables that can be joined to other tables through their relations.
Type of DB
Examples
Description
Hierarchical
IBM Information Management System (IMS), Windows Registry
Data is stored in a parent-children relationship nodes, in a tree like structure. The data is stored as a collection of fields where each field contains only one value. The records are linked to each other via a parent-children relationship. In a hierarchical database model, each child record has only one parent. A parent can have multiple children. To retrieve a field’s data, we need to traverse through each tree until the record is found.
Type of DB
Examples
Description
Network
Data Store (IDS), IDMS (Integrated Database Management System), Raima Database Manager, TurboIMAGE
Network database management systems (Network DBMSs) use a network structure to create relationship between entities. Network databases are mainly used on large digital computers. Network databases are hierarchical databases but unlike hierarchical databases where one node can have one parent only, a network node can have relationship with multiple entities. A network database looks more like a cobweb or interconnected network of records.
Type of DB
Examples
Description
Multidimensional
Microsoft Analysis Services, Hyperion Essbase, Cognos PowerCube
Multi-dimensional databases (MDBs) use the concept of a data cube (or hypercube) to represent the dimensions of data available to users (though physically they are stored as compressed multidimensional arrays with offset positioning). An MDB with three dimensions looks like a cube, whilst an MDB with four or more dimensions is called a hypercube, and becomes more difficult to visualise. They are designed to assist with decision support systems, and to optimise online analytical processing (OLAP) and data warehouse applications.
Type of DB
Examples
Description
Object-oriented
TORNADO, Gemstone, ObjectStore, GBase
Object-oriented databases use small, recyclable separated chunks of data called objects. The objects themselves are stored in the object-oriented database. Each object contains two things: the object itself, and the metadata that explains that object (i.e. it’s purpose, what it is and where it fits in).
Type of DB
Examples
Description
NoSQL
The ‘No’ stands for ‘Not Only’ SQL - these database seek to improve on standard SQL based DBMS by enhancing the forms of analytics available and changing the way that data is stored (no longer relational.) These databases are ideal for storing unstructured data.
Recap
Learning Objectives
Explain the concepts and uses of a relational database management system
Identify the different types of key in a RDBMS
Understand the principles of normalisation on a relational database
Assignment
Database Design
Use a work-related dataset to design your own relational database. You should describe the dataset, follow the normalisation steps and create an Entity Relationship Diagram (ERD).
Word Count
Max 1500 words
Deadline
3 weeks
Deliverables
Word Document, PowerPoint, Excel File, PDF, Lucid Chart